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Abstract 

We introduce utility-directed procedures for 
mediating the flow of potentially distract- 
ing alerts and communications to computer 
users. We present models and inference pro- 
cedures that balance the context-sensitive 
costs of deferring alerts with the cost of in- 
terruption. We describe the challenge of rea- 
soning about such costs under uncertainty via 
an analysis of user activity and the content of 
notifications. After introducing principles of 
attention-sensitive alerting, we focus on the 
problem of guiding alerts about email mes- 
sages. We dwell on the problem of inferring 
the expected criticality of email and discuss 
work on the Priorities system, centering 
on prioritizing email by criticality and modu- 
lating the communication of notifications to 
users about the presence and nature of in- 
coming email. 



1 Introduction 

Multitasking computer systems provide great value 
to users by hosting numerous processes and applica- 
tions simultaneously. However, the ongoing execution 
of multiple applications often leads to environments 
fraught with a variety of notifications, including mes- 
sages from the operating system about the status and 
health of computational processes, alerts from the pri- 
mary application at focus, and from other applications 
being executed in the background. 

Beyond traditional sources of peripheral information, 
recent work on human-computer interaction highlights 
new forms of ongoing background services that can 
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provide potentially useful context-sensitive informa- 
tion and analysis (Breese, Heckerman, & Kadie, 1998; 
Czerwinski, Dumais, & Robertson et al.; Leiberman, 
1995; Horvitz, Breese, Heckerman et al., 1998; Horvitz, 
1999). Indeed, novel sources of information, as well as 
more familiar alerts about incoming email messages, 
tips about application usage, and information about 
the computer system and network may be valuable. 
However, the rendering of auxiliary information under 
uncertainty comes at the cost of potentially distracting 
the user from a primary task at the focus of attention. 

We are exploring utility-directed notification policies 
within the Attentional Systems project at Microsoft 
Research. We shall describe procedures that can pro- 
vide policies to support an automated attention man- 
ager that one day might be relied upon by computer 
users to mediate the transmission of notifications. 

We take the perspective that human attention is the 
most valuable and scarcest commodity in human- 
computer interaction. Rapid increases over the last 
two decades in computational power and network 
bandwidth, coupled with the explosion in the avail- 
ability of online content, stand in stark contrast to 
the constancy of limitations in human information pro- 
cessing. 

Characterizations of the inability of people to handle 
more than a handful of concepts in the short-term are 
perhaps the most critical results of Twentieth-century 
psychology (Miller, 1956; Waugh, 1965). Beyond gen- 
eral characterizations of cognitive limitations, psychol- 
ogists have explored the influence of various forms of 
interruption on human memory and planning, start- 
ing with the early work of Zeigarnik and Ovsiankina 
(Zeigarnik, 1927; Ovsiankina, 1928). The rich body 
of work in this realm includes studies centering on the 
use of interruptions as a tool to probe the machinery of 
memory and problem solving as well as to ascertain the 
influence of distractions on the efficiency with which 
tasks are accomplished (Gillie & Broadbent, 1989; Van 
Bergan, 1968; Posner & Konick, 1966). 
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Figure 2: Extending the Bayesian network to consider 
key dependencies over time. 



Figure 1: A Bayesian model for inferring the probabil- 
ity distribution over a user's attentional focus. 

We have been pursuing opportunities to harness in- 
ference and decision-making procedures to guide the 
rendering of notifications about messages of uncer- 
tain value. Our approach centers on developing the 
means for automatically assessing the expected util- 
ity of messages and for continuing to make inferences 
about a user's focus of attention by monitoring multi- 
ple sources of information. 

We shall focus first on the use of Bayesian models to 
infer a probability distribution over a user's focus of 
attention and harnessing such inferences to infer the 
expected cost of transmitting alerts to users. Then, 
we consider methods for inferring the informational 
benefits of alerts and the costs of deferring notifica- 
tion. After discussing principles of alerting based on a 
consideration of probability distributions over a user's 
attention and the time criticality of alerts, we shall 
present selected details of work on developing notifica- 
tion and forwarding policies for incoming email. 

2 Inference about a User's Attention 

Alerts provide potentially valuable information at a 
cost of interruption. The cost of an interruption de- 
pends on the nature of the interruption and on a user's 
current task and focus of attention. In the general 
case, a computer system is uncertain about the details 
of a user's attention. Thus, we seek to build or learn 
probabilistic models that can make inferences about a 
user's attention under uncertainty. 

We have pursued the construction of Bayesian models 
that can infer a probability distribution over a user's 
focus of attention. In building probabilistic models 
for inferring the context-sensitive cost of distraction, 
we consider a set of mutually exclusive and exhaustive 
states of attentional focus and seek to identify the cost 
of communicating an alert given a probability distribu- 



tion over the states of a user's attention. Such states 
of attention can be formulated as a set of prototypical 
situations or more abstract representations of a set of 
distinct classes of cognitive challenges being addressed 
by a user. Alternatively, we can formulate models that 
make inferences about a continuous measure of atten- 
tional focus, or models that directly infer a probability 
distribution over the cost of interruption for different 
types of notifications. In our initial approach to mod- 
eling a user's attention, we have Bayesian networks 
that can be used to infer the probability of alternate 
activity contexts based on a set of observations about 
a user's activity and location. 

Figure 1 displays a Bayesian network for inferring 
a user's focus of attention for a single time period. 
States of the critical variable, Focus of Atten- 
tion, refer to desktop and nondesktop contexts. Sam- 
ple attentional contexts considered in the model in- 
clude Situation awareness-catching up, Non- 
specific BACKGROUND TASKS, FOCUSED CONTENT 
GENERATION OR REVIEW, LIGHT CONTENT GENERA- 
TION or review, Browsing documents, Meeting 
in office, Meeting out of office, Listening to 
presentation, private time, family-personal 
focus, Casual conversation, and Travel. 

The Bayesian network specifies that a user's current 
attention and location are influenced by the user's 
scheduled appointments, the time of day, and the prox- 
imity of deadlines. The probability distribution over 
a user's attention is also influenced by summaries of 
the status of ambient acoustical signals monitored in a 
user's office; segments of the ambient acoustical signal 
over time provide clues about the presence of activity 
and conversation. The status and configuration of soft- 
ware applications and the ongoing stream of user ac- 
tivity generated by a user interacting with a computer 
also provide rich sources of evidence about a user's 
attention. As portrayed in the network, the software 
application currently at top-level focus in the operat- 
ing system influences the nature of the user's focus and 
task, and the status of a user's attention and the appli- 
cation at focus together influence the computer-centric 
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activities. Such activity includes the stream of user ac- 
tivity built from sequences of mouse and keyboard ac- 
tions (see Horvitz, Breese, Heckerman et al., 1998 for a 
discussion of events and event languages for monitor- 
ing user behavior) and higher-level patterns of applica- 
tion usage over broader time horizons. Such patterns 
include Email-centric and Word-processor cen- 
tric, referring to prototypical classes of activity in- 
volving the way multiple applications are interleaved. 

A more comprehensive Bayesian model for a user's at- 
tentional focus considers key dependencies among vari- 
ables at different periods of time. A dynamic network 
model including a set of Markov temporal dependen- 
cies is portrayed in Figure 2. In real-time use, such 
Bayesian models consider information provided by an 
online calendar, and a stream of observations about 
room acoustics and user activity as reported by an 
event sensing system, and continues to provide inferen- 
tial results about the probability distribution a user's 
attention. 

3 Expected Cost of Interruption 

Let us assume that the expected utility of relaying 
information contained in an alert to a user can be de- 
composed into the expected costs and benefits of the 
alerting action. For such decomposable utility mod- 
els, we can assume that the utility is the difference 
between the expected costs and benefits of the infor- 
mation provided by the alert. We focus first on the 
expected cost of immediate alerting. 

Alerts and notifications can take the form of audio, 
visual, or a combination of audio and visual channels. 
Beyond the cognitive cost of the immediate distraction 
associated with an alert, visual alerts can obstruct im- 
portant content being accessed or referred to as part 
of the task at hand. The cost associated with an au- 
tonomous notification can depend on the details of the 
rendering of the alert. Thus, in the general case, dis- 
tinct dimensions of cost associated with different no- 
tification designs must be considered in models of in- 
terruption. 

As an example, it may be useful to decompose the cost 
of an alert into the cognitive cost associated with an 
interruption and the cost of obstruction of important 
display real estate. The latter dimension of cost can 
depend significantly on the design of the visual alert 
and the status of displayed information associated with 
the main task at hand. 

A design that overlays a graphical notification over 
content at the center of a user's attention and that 
requires a user to take action to remove the displayed 
alert is more costly than an alert that appears and dis- 



appears autonomously in a timely and elegant manner. 
For simplification, we shall merge the cost of interrup- 
tion and the cost of obstruction into a single cost. The 
generality of the methods will not suffer from such a 
coalescence. 

Consider a set of alerting outcomes, A^Fj, represent- 
ing the situation where a notification Ai occurs when 
a user is in a state of attentional focus, Fj. We as- 
sess for each alerting outcome, a cost function of the 
form C a (Ai,Fj), referring to the cost of being alerted 
via action Ai when the user is in attentional state Fj. 
Given uncertainty about a user's state of attention, 
the expected cost of alerting (ECA) a user with action 
Ai is, 

ECA = J2C a (A i ,F j )p(F j \E«) (1) 

3 

where E a refers to evidence relevant to inferring a 
user's attention. 

4 Expected Cost of Deferring Alerts 

A strategy for reducing the cost associated with alerts 
is to suppress the alerts or to defer them until a period 
of time when the cost of relaying them is smaller. De- 
cisions about deferral must take into consideration the 
cost associated with the delayed review of the infor- 
mation. We now turn to the expected cost associated 
with deferring the review of a notification for some 
time t. 

4.1 Cost of Delayed Action 

We define the criticality of a notification as the 
expected cost of delayed action associated with review- 
ing the message. The expected cost of delayed action 
(ECDA) has been applied in such domains as emer- 
gency medicine (Horvitz & Rutledge, 1991; Horvitz & 
Seiver, 1997) and time-critical aerospace applications 
(Horvitz & Barry, 1995). ECDA is the difference in 
the expected value of taking immediate ideal action 
(action at time t 0 ), and delaying the ideal action until 
some future time t. Given a probability distribution, 
p(H\E), over states of the world H, associated with 
different time criticalities, and a time-dependent util- 
ity function over outcomes, u(Ai,Hj,t), the expected 
cost of delayed action for notifications is, 

ECDA = 

max u(Ai,Hj, t 0 )p(Hj\E) 

j 

-mx X . A Yu{A i ,H j ,t)p{H j \E) (2) 

3 
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ECDA provides a conceptual framework for reasoning 
about the cost of the delayed review of notifications. 

Let us consider the example of decisions about noti- 
fying users about the arrival of messages via email. 
We must consider the criticality of the email and the 
cost of interruption associated with the user's focus 
of attention. Notification about email includes desk- 
top alerting when the user is working at or near a 
computer and notification via a mobile communica- 
tion device, such as a cell phone or pager, when the 
user is away from a networked computer. 

The utility of reading an email message can diminish 
significantly with delay in reviewing the message. In a 
salient example, delay in reviewing a message that in- 
forms a user about a competitive bidding situation can 
lead to a costly loss of opportunity. Costs of delayed 
review of messages may be high in the context of com- 
munications involving coordination. Important meet- 
ings and deadlines can be missed with delayed review 
of messages. In less severe situations, costs can accrue 
with reductions in the amount of time available to pre- 
pare effectively for a meeting. For such cases, the cost 
of delayed review of messages can be represented by 
loss functions that operate on the amount of time re- 
maining until the meeting being communicated about 
occurs. After a meeting has passed, many options for 
action are eliminated. Thus, the rate of loss incurred 
with delays in the review of a message are typically 
smaller for periods of time following the occurrence of 
a meeting described in an email message. 

We could attempt to group messages into classes in- 
dexed by the types of action indicated at progressively 
later times and endeavor to formulate a set of out- 
comes associated with ideal actions at different delays 
in reviewing the messages. With such a representation, 
Equation 2 could be used to compute an expected cost 
of delayed review directly. Alternatively, we can sim- 
plify ECDA by considering the probability that a mes- 
sage is a member of one of several criticality classes, 
given features of the messages. We associate with each 
criticality class a time-dependent cost function, de- 
scribing the rate at which losses accrue with delayed 
review of the message. We take t 0 to be the moment 
that email arrives and compute the expected cost for 
delays in reviewing the message until time t. In the 
general case, the costs of delayed review for messages 
in each criticality class may be a nonlinear function of 
delayed review. 

The complexity and scope of communications among 
people makes the certain identification of the critical- 
ity of email messages difficult. It is more feasible to 
pursue inference about a probability distribution over 
the criticality of a message given evidence gleaned from 



attributes of the message, including information con- 
tained in the header and body of email messages. 

We shall return to explore in detail methods for learn- 
ing the criticality of email messages in Section 5. For 
now, let us assume that each message is a member of 
one of n criticality classes. We further assume that 
each class is associated with a criticality-class-specific 
constant rate of loss that describes the cost of delayed 
review. Using C d to represent a time-dependent rate 
of loss with delay, we can reduce Equation 2 to an 
expected cost of delayed review (ECDR), 

ECDR = J2(* ~ to)C d (Hi)p(Hi\E d ) (3) 

where t 0 represents the time a message arrives, t is the 
time the message is reviewed, and E d is evidence used 
to infer a probability distribution over the criticality 
class, H, of a new incoming message at hand. We refer 
to the constant rate of loss associated with delayed 
review as the expected criticality (EC) of a message, 

EC = J2C d (H i )p(H i \E d ) (4) 

i 

4.2 Ideal Alerting about New Messages 

Users typically review email periodically even when 
their computing systems are configured to suppress the 
active emission of alerts about incoming email. To de- 
velop ideal alerting policies, we consider the cost of 
delayed review of information incurred in a world ab- 
sent of notifications. The cost of delay in such settings 
depends on the criticality of the message and the time 
passing before a user reviews a message without exter- 
nal prompting. 

The expected delay in the review of messages in an 
alert-free setting can be inferred from information 
about the frequency that users will attend to unread 
messages without prompting. Beyond considering the 
frequency that users will review messages on their own, 
we can consider expected delays associated with a pol- 
icy of relaying notifications to users when the cost of 
interruption is inferred to be negligible. With such 
a policy in force, the delay until new messages are 
reviewed can be inferred from information about the 
expected time before a user's attentional resources will 
become freed to review the messages. 

We refer to the time between periods of reviewing new 
messages in the absence of explicit alerts as the inspec- 
tion interval, I. The inspection interval is influenced 
by multiple factors including the user's focus of at- 
tention and location. A user's inspection interval is 
typically reduced when they are at a distance from 
networked computers. 
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The Bayesian networks presented in Figure 1 and 2 
include a variable representing the inspection interval. 
As displayed by the dependency structure of the mod- 
els, the variable Inspection Interval, is influenced 
by User's Attentional Focus and Application 
Usage Pattern. 

Given a probability distribution over the inspection 
interval, the expected loss associated with reviewing 
messages in an alert-free setting, ECDR', is 

ECDR' = 5>(^)(*/- a +Ij-t„) £ C d {H i )p{H i \E d ) 

(5) 

where ti_ 1 is the time of last access, t Q is the time a 
message has arrived, and Ij is the inspection interval. 

The expected value of transmitting an alert (EVTA) 
about a message at some time t before a user reviews 
the email is the increase in the expected utility with 
being informed about the message at t versus at the 
time we expect the user to access the email in the 
absence of an alert. That is, 

EVTA = 

+ h ~ *») E C d (H i )p(H i \E d ) 

3 i 

-Y J (t-t 0 )C d (H i )p(H i \E d ) (6) 

i 

A system should relay information about a message 
when the net value of the alert (NEVA) is positive. 
This is the case when the EVTA dominates the imme- 
diate ECA for the type of alert under consideration, 

NEVA = EVTA - ECA (7) 

4.3 Chunking Messages and the Value of 
Alerting 

The grouping together of information from multiple 
messages into a single compound alert can raise the 
value of the content revealed under the guise of a sin- 
gle, but potentially more complex, distraction. Re- 
viewing information about multiple messages in an 
alert can be more costly than an alert relaying infor- 
mation about a single message. We represent such 
increases in distraction by allowing the cost of an alert 
to be a function of its informational complexity. 

Let us assume that the EVA of an email message is 
independent of the EVA of other email messages. We 
use EVTA(Mj, t) to refer to the value of alerting a user 
about a single message Mj at time t and ECA(n) the 
expected cost of associated with relaying the content 
of n messages. We can modify Equation 7 to consider 
multiple messages by summing together the expected 



value of relaying information about a set of n new mes- 
sages, 

NEVA = jr EVTA (M^ i) - ECA(n) (8) 

We note that assuming independence in the value of 
reading distinct messages may lead to an overestima- 
tion of the value of the multiple-message alert because 
strings of messages received in sequence may refer to 
related content. 

Given inferred probability distributions over a user's 
attentional focus and inspection interval, an assess- 
ment of the costs of distracting a user with alerts, 
and the time criticality of incoming messages, we can 
employ NEVA to continue to reason about the costs 
versus the benefits of alerting users with summariz- 
ing information about the content of newly arriving 
email messages. We now turn to the task of auto- 
matically assigning measures of expected criticality to 
email messages. 

5 Assigning Criticality to Messages 

Building a real-world system for exploiting NEVA to 
control alerting hinges on an ability to automatically 
assign a measure of expected criticality to incoming 
messages. Given the challenge and importance of mak- 
ing inferences about the criticality of alerts, we shall 
dwell on details of inferring the expected criticality 
of email messages. Such methods have application to 
other classes of notifications. 

We have developed an automated criticality classifier 
for email by leveraging and extending learning and in- 
ference methods developed for performing text classi- 
fication. The methodology employs several phases of 
analysis including: (1) selection of features, (2) con- 
struction of a classifier, (3) mapping classifier outputs 
to the likelihood that an email message is a member 
of each criticality class, and (4) the computation of an 
expected criticality from the probability distribution 
over criticality classes for email messages. 

Text classification is an active area of research and de- 
velopment (see Dumais, Piatt, Heckerman et al, 1998 
for a review of recent efforts. Machine learning meth- 
ods employed in text classification include decision 
trees (Lewis & Ringuette, 1994), regression (Yang & 
Chute, 1994), Bayesian models (Lewis & Ringuette, 
1994; Sahami, 1996; Sahami, Dumais, Heckerman et 
al., 1998), and Support Vector Machines (Joachims, 
1998; Scholkopf, Burges, & Smola, 1998). 

Our group has been studying the characteristics and 
performance of several text classification methodolo- 
gies for classifying email including procedures based on 
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Bayesian network learning procedures (Sahami, Du- 
mais, Heckerman et al., 1998) and the Support Vector 
Machine learning methodology (Vapnik, 1995; Piatt, 
1999a). Our studies of standard test corpora {e.g., 
Reuters corpora of business articles) and a variety 
of text classification tasks demonstrated that specific 
forms of SVM strategies dominated naive Bayes clas- 
sification procedures developed to date for text classi- 
fication (Dumais, Piatt, Heckerman et al, 1998). 

Our current implementation of criticality assignment 
for email is based on a linear Support Vector Machine 
training methodology developed by Piatt called Se- 
quential Minimal Optimization (Piatt, 1999a). Sup- 
port Vector Machines build classifiers by identifying a 
hyperplane that separates a set of positive and neg- 
ative examples with a maximum margin (see Piatt, 
1999a for details). In the linear form of SVM that 
we employ to assign criticality classes to email, the 
margin is defined by the distance of the hyperplane to 
the nearest positive and negative cases for each class. 
Maximizing the margin can be expressed as an opti- 
mization problem and search and optimization thus lay 
at the core of different SVM-based training methods. 

Traditionally, SVM training methods yield classifiers 
that output a score describing the strength of member- 
ship in a category. Piatt has extended SVM methods 
by developing a methodology that provides an esti- 
mate of the probabilities that items are members of 
different classes (Piatt, 1999b). The procedure em- 
ploys regularized maximum likelihood fitting to pro- 
duce estimations of posterior probabilities. We har- 
nessed this approach to learn classifiers that output 
the probability that an email message is a member of 
different criticality classes. 

In practice, we create a set of criticality classes and as- 
sess time-dependent cost functions for each class. We 
obtain a training set by manually partitioning a cor- 
pus of sample messages into distinct criticality classes. 
Given a training corpus of messages labeled by criti- 
cality, we first apply feature-selection procedures that 
attempt to find the most discriminatory features for 
the set of target classes, using several phases of analy- 
sis including a mutual-information analysis (Roller & 
Sahami, 1996; Dumais, Piatt, Heckerman et al., 1998; 
Sahami, Dumais, Heckerman et al., 1998). We refer 
readers to the text-classification literature for details 
on practical and theoretical issues in feature selection. 

Feature selection procedures for text classification can 
operate on single words or higher-level distinctions 
made available to the algorithms, such as phrases and 
parts of speech tagged with natural language pro- 
cessing. Basic feature selection algorithms for text 
classification typically perform a search over single 



words. Beyond the reliance on single words, we can 
make available to feature selection procedures domain- 
specific phrases and high-level patterns of features, 
including general expressions that operate on classes 
of words and other features in email messages. We 
found that providing such special tokens to text- 
classification procedures can enhance classification sig- 
nificantly (Sahami, Dumais, Heckerman et al., 1998 ). 

In investigating the construction of classifiers for email 
criticality, we identified special phrases and other 
classes of observations that we suspected could be of 
value for discriminating among email messages assoc- 
iated with different time criticalities. The handcrafted 
features are considered during feature selection. To- 
kens and patterns of value in identifying the criticality 
of messages include such distinctions as: 

• Sender: Single person versus an email alias, peo- 
ple at a user's organization, organizational rela- 
tionship to user, names included on a user con- 
structed list, people user has replied to 

• Recipients: Sent only to user, sent to a small num- 
ber of people, sent to a mailing list 

• Time criticality: Inferred time of an implied meet- 
ing, language indicating cost with delay, including 
such phrases as "happening soon," "right away," 
"as soon as possible," "need this soon," "right 
away," "deadline is" "by time, date," etc. 

• Past tense: Phrases used to refer to events that 
have occurred in the past such as, "we met," 
"meeting went," "took care of," "meeting yester- 
day," etc. 

• Future tense: Phrases used to refer to events that 
will occur in the future including "this week," 
"Are you going to," when are you," etc. 

• Future dates: Days and times representing future 
dates. 

• Coordination: Language used to refer to coor- 
dinative tasks such as "get together," "can we 
meet," "coordinate with," etc. 

• Personal requests: Phrases associated with direct 
requests for assistance, including sentences ending 
with question marks, "will you," "are you," "can 
you," "I need," "take care of," "need to know," 
etc. 

• Importance: Language and symbols referring to 
importance including the presence of an explicit 
high or low priority flag, and such phrases as "is 
important," "is critical," etc. 
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Figure 3: Discriminatory power of an email criticality 
classifier. The curve indicates the probability of mis- 
classification at different decision thresholds for a test 
set of hand selected messages in high and low critical- 
ity classes. 

• Length of message: Size of new component of a 
message (excluding the forwarded thread) 

• Presence of attachments: Noting the inclusion of 
documents in the email 

• Time of day: The time a message was composed. 

• Signs of Junk email: patterns such as percent 
nonalphanumeric characters, and pornographic 
content, marketing phraseology such as "Free!," 
"Only $," "Limited offer," etc. 

We found that the coupling of an SVM classifier with 
criticality-specific tokens can effectively classify email 
into criticality classes and into overall estimates of 
expected criticality. In an evaluation, a criticality clas- 
sifier was trained from approximately 1500 messages, 
divided into approximately equal sets of low and high 
priority email messages. A curve showing the ability 
of the classifier to classify messages from a test corpora 
consisting of 250 high and 250 low priority messages, 
selected by a user from a large inbox, is displayed in 
Figure 3. The Receiver-Operator (ROC) curve dis- 
plays the probability of high priority email being clas- 
sified as low priority email and the probability of low 
priority email being classified as high priority email 
for different values of the probability threshold used 
to define the high and low criticality message classes. 

Although it is useful to demonstrate the ability of the 
classifier to appropriately label cases of low and high 
criticality email, we are most interested in the use of 
the inferred probabilities of membership in alternate 
classes to compute the expected criticality of messages, 
and in the ultimate use of such information in comput- 
ing the NEVA associated with messages. 



As part of the validation of the automated assign- 
ment of measures of criticality for email, we gener- 
ated expected criticalities of email messages, assum- 
ing a linear cost of delay with time for each criticality 
class, and summing the costs for each class weighted 
by the probability that messages are members of each 
class as reported by the classifier. Our validations have 
shown that the classifier performs well even with the 
use of only two classes of criticality: time-critical mes- 
sages and normal/low priority messages. In a valida- 
tion study, one of the authors scored the criticality of 
messages by hand on a 1 to 100 scale, using 1 to indi- 
cate the messages of lowest criticality and 100 to rep- 
resent the most time-critical messages. To probe the 
effectiveness of the expected criticality measure, we 
computed correlation coefficients and generated scat- 
ter plots to visualize relationships between the assessed 
criticalities and the computed expected criticality. In 
a sample study, one of the authors assessed the criti- 
cality of 200 email messages received over three days. 
A correlation coefficient of 0.9 was found between the 
user tagged criticality and the automated assignment 
of expected criticality. 

6 Priorities Prototypes 

We have been exploring the use of attention manage- 
ment for email messages through implementations of 
several prototypes we refer to as the Priorities fam- 
ily of systems. The Priorities prototypes learn clas- 
sifiers from examples drawn from a user's email and 
apply the classifiers in real time to assign expected 
criticalities to incoming email messages. The systems 
work with the MS Outlook 2000 messaging and cal- 
endar system. During feature selection, the systems 
consider categories of features described in Section 5. 

The classification learning and inference procedures 
have been integrated in a software application that 
calls the Microsoft Exchange MAPI and Outlook 2000 
CDO interfaces. These services grant the system ac- 
cess to details of the message header, including sender 
and recipient information, and the organizational hier- 
archy at Microsoft. When email arrives, the real-time 
classifier examines the incoming messages for words 
and phrases and makes calls to acquire sender, recipi- 
ent, and organizational information. 

An early version of Priorities has been distributed 
widely at Microsoft for real-world testing. This ver- 
sion assigns a measure of expected criticality to all in- 
coming mail, using a pretrained, default classifier or a 
classifier that is custom-trained by the onboard learn- 
ing subsystem. The system has been integrated with 
the MS Research Eve event sensing system, developed 
as part of the Lumiere intelligent interface project 
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(Horvitz, Breese, & Heckerman et al., 1998), enabling 
the system to continue to consider a variety of obser- 
vations, including keyboard and mouse activity, and 
room acoustics. Information about a user's schedule 
is accessed directly from Outlook's online calendar. 

The version of the Priorities system that is currently 
being tested by users at Microsoft provides an email 
viewer client that displays email sorted by criticality 
and scoped by a user-specified period of time. A dis- 
play of the Priorities client is displayed in Figure 4. 
The prototype can be instructed to take a variety of 
actions based on observations about the user's activ- 
ity and location, and the inferred expected criticality 
of incoming mail. Actions include playing criticality- 
specific sounds that were specially composed for the 
system, bringing the client to the foreground, and 
opening email messages and sizing and centering the 
email according to criticality. The system can be di- 
rected to perform a variety of automated forwarding 
and response services based on expected criticality. 
Moving beyond the desktop, the system has the abil- 
ity to forward messages to a user's cell phone or pager 
based on criticality and the time a user is away from 
the office. For mobile settings associated with limited 
time and bandwidth, Priorities can be employed to 
download messages in order of expected criticality. 

A more advanced version of Priorities, named 
Priorities-Attend serves as our testbed for per- 
forming more sophisticated inference about a user's 
attention and for making decisions about notification 
based on NEVA. This version has been integrated 
with a manually constructed Bayesian network that 
performs inference about a user's attention. Work 
is underway on the development of effective assess- 
ment techniques and richer models for representing 
and reasoning about a user's attention and the costs 
of interruption. Our experiences to date with the 
use of automated alerting machinery suggest that a 
decision-theoretic approach to alerting can fundamen- 
tally change the way users work with email communi- 
cations. 

7 Summary 

We have described efforts to harness decision-theoretic 
principles to control alerting in computing and com- 
munication systems. We presented attention-sensitive 
procedures for computing the net expected value of 
alerts. We framed the discussion with the task of relay- 
ing notifications about incoming email messages. Af- 
ter presenting principles for decisions about alerting 
users about messages, we presented work on automat- 
ically assessing the expected criticality of email mes- 
sages. Finally, we presented work on the Priorities 




Figure 4: Display provided by the client of a version of 
Priorities being tested by multiple users. The client 
comes into view upon demand or when criticality- 
directed policies bring it to the foreground. 

systems, prototypes that operate with the Microsoft 
Outlook email and scheduling application. 

There are numerous opportunities for enhancing the 
value of computing systems through harnessing meth- 
ods that perform ongoing inference about a user's at- 
tention and about the criticality of different sources 
of information. We are continuing our pursuit of 
decision-theoretic machinery that can endow operating 
systems with the ability to monitor multiple sources of 
information and make intelligent decisions about the 
expected value of transmitting notifications to users. 
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